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The recently identified etiological agent of the severe acute respi- 
ratory syndrome (SARS) belongs to Coronaviridae (CoV), a family 
of viruses replicating by a poorly understood mechanism. Here, we 
report the crystal structure at 2.7-A resolution of nsp9, a hitherto 
uncharacterized subunit of the SARS-CoV replicative polyproteins. 
We show that SARS-CoV nsp9 is a single-stranded RNA-binding 
protein displaying a previously unreported, oligosaccharide/oligo- 
nucleotide fold-like fold. The presence of this type of protein has 
not been detected in the replicative complexes of RNA viruses, and 
its presence may reflect the unique and complex CoV viral repli- 
cation/transcription machinery. 


I: 2003, a human coronavirus (CoV) was identified as the 
causative agent of a form of atypical pneumonia: severe acute 
respiratory syndrome-CoV (SARS-CoV) (1-5). Coronaviridae 
have the longest known single-stranded (ss)RNA genome (27- 
31.5 kb), with a complex genetic organization and sophisticated 
replication/transcription cycle (6, 7). Twenty-eight proteins are 
predicted to be encoded by the SARS-CoV genome (8, 9). The 
nonstructural (nsp) or “replicase” proteins of CoVs are derived 
from an unusually large replicase gene of >20 kb that consists 
of two large ORFs (ORFs la and 1b). Translation of this 
replicase gene from the incoming genomic RNA is the first step 
in CoV genome expression and includes a —1 ribosomal frame- 
shift to express the ORF1b-encoded polypeptide. Translation 
products are the ppla polyprotein (>4,000 amino acids) and the 
C-terminally extended pp1lab polyprotein (>7,000 amino acids), 
which are both cleaved by two or three ORFla-encoded viral 
proteinases (10). Most of these replicase cleavage products 
assemble into a membrane-associated viral replication/ 
transcription complex. Among other components, this complex 
includes a set of relatively small polypeptides (nsp6 to nsp11) 
encoded by the 3’ region of ORF 1a, for which no predicted nor 
proven function has been assigned. For the mouse hepatitis CoV, 
several of these cleavage products were reported to colocalize 
with other components of the viral replication complex in the 
perinuclear region of the infected cell (11), suggesting their 
involvement (directly or indirectly) in viral RNA metabolism. 

As part of a viral structural genomics program (12), we have 
cloned the 28 gene products of SARS-CoV and expressed them 
either as full-length proteins or as (predicted) functional do- 
mains. The determination of the three-dimensional structures of 
these gene products is expected to facilitate and accelerate 
discovery of drugs against this emerging and life-threatening 
pathogen. Furthermore, structural homology search is becoming 
a powerful method to infer biochemical and/or biological func- 
tion of previously uncharacterized proteins. We report here the 
crystal structure of nsp9, one the SARS-CoV uncharacterized 
nonstructural protein, as well as evidence for its function as an 
ssDNA/RNA-binding protein. 
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Materials and Methods 


Crystallization, Structure Determination, and Refinement. SARS- 
CoV nsp9 has been expressed, purified, and crystallized as 
described (12). X-ray diffraction data were collected at 100 K at 
the European Synchrotron Radiation Facility, Grenoble, France. 
Native data were collected on beamline ID14-1 by using a 
Quantum ADSC Q4R charge-coupled device detector. Crystals 
diffracted X-rays to 2.7-A resolution and belonged to space 
group P6,22 with unit cell dimensions a = b = 89.7 A, c = 136.7 
A. There are two molecules per asymmetric unit, leading to a 
solvent content of ~60%. The structure was solved by using 
single-wavelength anomalous dispersion data (13) collected at 
the peak wavelength of selenium on beamline ID14—-4. 

Data were integrated with MOSFLM and were scaled by using 
SCALA (14). The four expected selenium sites (two in each of the 
two molecules of the asymmetric unit) were identified and 
refined by using SOLVE (15). Density modification of the exper- 
imental maps and initial fragment building was performed with 
RESOLVE (16). Model building was carried out by using TURBO- 
FRODO (17) and maximum-likelihood refinement was performed 
with REFMACS (18) using NCS restraints. Residues 3-113 could 
be modeled and refined in molecule A, and residues 4-113 in 
molecule B. Four sulfate ions and 31 water molecules were added 
manually during refinement. Overall geometric quality of the 
model was assessed by using PROCHECK (19). A total of 86.1% of 
the residues were found in the most favored regions of the 
Ramachandran plot, and 13.4% were in additionally allowed 
regions. The solvent-accessible surface of nsp9 was calculated 
and displayed by using GRASP (20) and SWISS-PDBVIEWER (21). 
Fig. 1.A and B was generated by using MOLSCRIPT (22) and was 
rendered by using RASTER3D (23). The alignment of Fig. 1C has 
been displayed by using the program ESPRIPT (24). 


Surface Plasmon Resonance and Fluorescence Spectroscopy. DNA 
and RNA oligonucleotides were purchased from Life Technol- 
ogies (Grand Island, NY) and Amersham Pharmacia, respec- 
tively. Surface plasmon resonance measurements were per- 
formed on a BIAcore apparatus (Pharmacia Biosensor) by using 
the BIAlogue kinetics evaluation program (BIAEVALUATION 
v.3.1, Pharmacia Biosensor). Biotinylated oligonucleotides were 
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Fig. 1. Crystal structure, sequence, and topology of SARS-CoV nsp9. (A) Ribbon representation of SARS-CoV nsp9. One molecule of the dimer is gold and the 
other is cyan. Loops between strands x and y are labeled C. (B) A 90° view of A. (C) Multiple alignment of nsp9 sequences from SARS-CoV National Center for 
Biotechnology Information (NCBI) accession no. AY291315, and several related CoVs: HCoV 229E, human CoV 229E, NCBI accession no. NP_073550; TGEV, 
transmissible gastroenteritis virus, NCBI accession no. NP_058423; PEDV, porcine epidemic diarrhea virus CV777, NCBI accession no. NP_598309, BCoV, bovine CoV, 
NCBI accession no. NP_150074; MHV, mouse hepatitis virus MHV-A59, NCBI accession no. NP_045298; and IBV, avian infectious bronchitis virus, NCBI accession 
no. NP_040829). The consensus sequence (identity cutoff >70%) is displayed under the multiple sequence alignment. Dots and residues in lowercase correspond 
to positions for which the residue conservation is under and above the cutoff value, respectively; positions marked by # correspond to either Asn, Asp, Glu, or 
Gln; positions marked by ! correspond to either Ile or Val, and $ corresponds to Leu or Met. Residues that are conserved in all sequences are boxed in red, and 
those for which conservation is >70% are boxed in yellow. For a given position, only residues homologous to the consensus are bold. The top numbers correspond 
to the amino acid sequence of SARS-CoV nsp9. Secondary structure elements and loops of nsp9 SARS-CoV are numbered according to Fig. 1 and are indicated 
above the alignment. (D) Schematic representation of nsp9 topology. nsp9 SARS-CoV f-barrel structure is a concatenation of two Greek key motifs, Greek key 
1 having a g— topology and Greek key 2 a g+ topology (30), resulting in a six-stranded RH-g— to g+ topology. B-strands and a-helices are symbolized by arrows 
and cylinders, respectively, and they are numbered consistently with the sequence alignment. (£) Schematic representation of the typical Greek key (g— topology) 
motif found in the OB fold. 


immobilized on a Sensor Chip SA according to the manufac- 
turer’s instructions (BIAcore). 

Fluorescence quenching of the single tryptophan in nsp9 was 
measured by using a Cary Eclipse (Varian) equipped with a 
front-face fluorescence accessory at 20°C, by using 2.5-nm 
excitation and 10-nm emission bandwidths. The excitation wave- 
length was 280 nm and the emission spectra were measured 
between 290 and 540 nm. Titrations were performed in a 1-ml 
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quartz fluorescence cuvette containing 1 4M protein in 10 mM 
Tris-HCl buffer/300 mM NaCl, pH 8.0, and by the successive 
addition of aliquots of appropriate nucleic acids stock solutions 
(1 mM). Experimental fluorescence intensities were corrected 
for dilution. Data were analyzed by plotting the relative fluo- 
rescence intensities at 340 nm at increasing concentrations of 
quencher. Dissociation equilibrium constant (Kpapp) values were 
determined from data fitted to a single exponential equation, by 
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Table 1. Summary of crystallographic data 


SeMet Native 
Data collection 
Wavelength, A 0.9793 0.9340 
Resolution range, At 30-3.0 (3.11-3.0)  3.0-2.7 (2.84-2.7) 
No. of unique reflections 6,831 9,345 
No. of measured reflections 97,058 100,668 
I/al 7.4 (6) 9.4 (1.6) 
Multiplicity 14.2 (14.2) 10.8 (10.2) 
Completeness, % 99.9 (99.9 anomalous) 99.1 (99.0) 
Rmerge™, % 8.4 (47.9) 5.6 (43.3) 
Refinement 
Resolution limits, A 15.0-2.7 
R factor", Riree*, % 23.4/27.1 
rms deviation: bonds, 0.018/1.88 


A/angles, ® 


SeMet, selenomethionyl single-wavelength anomalous dispersion data set. 
Values in parentheses are for the highest-resolution shell. 
*Rmerger >>! |[h—Ii|/SX/h, where /h is the mean intensity for reflection h. 
*R factor, =|Fo — Fr|/> |Fol, where Fo and F, are measured and calculated 
structure factors, respectively. 
*Rtree Was Calculated over 5% of reflections not used in the refinement. 


using the PRISM 3.02 nonlinear regression tool (GraphPad, San 
Diego). 


Results and Discussion 

The crystal structure of SARS-CoV nsp9 is reported (Fig. 1 and 
Table 1). Crystals contain a dimer in the asymmetric unit (Fig. 
1A and B); in each monomer, seven B-strands and one a-helix 
(Fig. 1.A—-C) are arranged into a single compact domain and form 


a cone-shaped f-barrel flanked by the C-terminal a-helix. The 
latter makes a 45° angle with the axis of the B-barrel and has a 
high content of hydrophobic residues, yielding two hydrophobic 
sides. One faces the B-barrel and the other interacts with the 
a-helix of the second crystallographic monomer (Fig. 1A and B). 
This dimer is therefore assembled by hydrophobic interactions 
and is further stabilized by four long hydrogen bonds involving 
main-chain atoms. Comparing the buried dimerization surface of 
1,632 A? with the few other crystallographic contacts suggests 
that this crystallographic dimer is also present in solution, which 
is in agreement with dynamic light scattering and gel permeation 
experiments (12). This surface of 1,632 A? is among standard 
interfacial areas found in biologically relevant dimers (25). The 
two molecules of the dimer are spatially similar (rms deviations 
value of 0.99 A over the 109 C* atoms of the structure). This 
deviation is further reduced after exclusion of N and C termini 
together with the tips of three long loops (L23, L45, and_L5’6) 
emerging from the barrel (Fig. 1A, rms deviation of 0.45 A over 
the 87 C* atoms). 

Screening public protein databases with BLAST or PSI-BLAST 
(26) failed to identify any sequence homologue of CoV nsp9 
proteins, which is consistent with their yet unknown function. No 
structural homologues of nsp9 were found when scanning either 
the Protein Data Bank with the DALI server (27) or the CATH 
database (28) with the GRATH server. Visual inspection of the 
Structural Classification of Protein database, version 1.63 (29), 
revealed some common features between SARS-CoV nsp9 and 
four different existing folds (trypsin-like proteases, the C- 
terminal domain of y-transposase, a- and B-subunits of Fl-ATP 
synthase-like, and small protein B). Only the N-terminal six- 
stranded B-barrel of the trypsin-like proteases displays the same 
connectivity and spatial arrangement as nsp9 monomers, and can 
be significantly superimposed [rms deviation of 1.73 A over the 


Fig. 2. 


Surface analysis of SARS-CoV nsp9. (A) Electrostatic surface potential of nsp9 viewed from the same orientation as in Fig. 1A. Potential values range 


from —5 kT (red) to 0 (white) and to +5 kT (blue), where k is the Boltzman constant and T is the temperature. Accessible Lys and Arg residues are indicated. (B) 
Back view from A, with the same color code. (C) Accessible surface colored according to the following: Lys or Arg are blue; Tyr, Phe, and Trp are yellow; and Val, 
Leu, and Ile are orange. The orientation is the same than in A. (D) Same coloring as in C with the same orientation as in B. 
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74 C* atoms when superimposing nsp9 and thrombin structure 
(PDB code 1A2C)]. This homology has no functional relevance, 
however, because nsp9 lacks a trypsin-like catalytic triad. This 
visual inspection did, however, reveal two other interesting 
features. First, six-stranded closed B-barrels have been identified 
in several proteins that interact with RNA, such as the small 
protein B (30) and the domain III of EF-Tu (31). Second, the 
short six-stranded B-barrel of nsp9 includes an opened five- 
stranded barrel reminiscent of the five-stranded B-barrel of the 
oligosaccharide/oligonucleotide-binding (OB)-fold proteins. 
The latter proteins form a superfamily in which two-thirds of the 
members are nucleic acid-binding proteins (32). 

Structural homology between nsp9 and small protein B (or 
domain III of EF-Tu) is not strong enough to allow a reliable 
superimposition, which would have provided an indication about 
the localization of a putative nucleic acid-binding site. Likewise, 
nsp9 and OB-fold proteins cannot easily be compared, because 
Greek key 1 of nsp9 (Fig. 1D) and the classical OB-fold Greek 
key motif (Fig. LE) cannot be superimposed. Although there is 
a structural equivalence between the Greek key 2 motif of nsp9 
(Fig. 1D) and the OB-fold Greek key, the connectivity between 
strands is different. When both motifs are superimposed, the 
canonical binding face observed in the OB fold is buried in the 
dimer interface of nsp9. For these reasons, nsp9 may be con- 
sidered as a new variant within the OB superfamily. Nonetheless, 
nsp9 displays the same features that OB-fold proteins use to bind 
nucleic acids: a network of positively charged amino acids defines 
a positive track suitable for binding the phosphate backbone to 
the protein surface (Fig. 2.A and B), whereas exposed aromatic 
residues might provide stacking interactions with nucleobases 
(Fig. 2 C and D). These residues are conserved in all CoV nsp9 
sequences (Arg-10, Lys-52, Trp-53, Arg-55, Arg-74, Phe-75, 
Lys-86, Tyr-87, Phe-90, Lys-92, Arg-99, and Arg-111 in Figs. 1C 
and 2 A—D), further suggesting that nsp9 is a nucleic acid-binding 
protein. In addition, two extended loops L23 and L45 display 
weak electron density associated with high B factor values, 
indicating that they are flexible and/or mobile. They line the 
positively charged track, and they may clamp nucleic acids on the 
nsp9 surface after conformational change, as observed in other 
OB-fold proteins (32). In other members of the OB-fold super- 
family, each monomer has its own, autonomous single-stranded 
nucleic acid-binding site. For example, replication protein A 
trimerizes by means of its C-terminal a-helix, each monomer 
keeping an individual ssDNA-binding site acting cooperatively 
with other units of the trimer (33, 34). In nsp9, it is the dimeric 
form that provides a single, uninterrupted nucleic acid-binding 
site. 

Surface plasmon resonance was used to demonstrate the 
function of nsp9 as a nucleic acid-binding protein. Biotinylated 
oligonucleotides bound to a streptavidin-coated solid support 
are able to bind nsp9 (Fig. 34). This function was confirmed by 
fluorescence experiments. As a fluorophore, nsp9 monomer has 
a single Trp residue (Trp-53), which is partially exposed to the 
solvent. The Trp-53 indole moiety is in a polar environment 
comprising side chains of Gln-20, Gly-66, and more remotely, 
Lys-52. Interactions of Trp-53 with ligand might therefore 
quench its fluorescence. This occurrence was indeed observed by 
using ssDNA and ssRNA oligonucleotides of defined sequence. 
The quenching efficiency increased steadily when the probe size 
was increased from 6-mer to 45-mer and then reached a plateau 
(Fig. 3B). The occurrence of this plateau suggests that the nsp9 
tryptophans in the dimer achieve an optimal energy transfer 
(reflecting optimal molecular interactions) only through the 
tight packing of probes equal or longer than the 45-mer. In 
contrast, shorter probes do not result in optimal transfer, prob- 
ably due to remote or loose contacts with the tryptophans. With 
both ssDNA and ssRNA, a large decrease in tryptophan fluo- 
rescence was observed (Fig. 3C), but the emission maximum 
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Fig. 3. | SARS-CoV nsp9 is an oligonucleotide-binding protein. (A) BlAcore 


analysis of nsp9 binding to immobilized DNA oligonucleotides. The protein 
(16 «M) was injected at a flow rate of 5 yxl/min in HBS buffer on dextran layers 
containing 550 and 850 resonance units (RU) of the 25- and 45-mer oligonu- 
cleotides, respectively. The sensorgrams are the result of two independent 
experiments. RU values at 125 s (5 s after the end of the injection) are 
indicated. (B) Tryptophan fluorescence quenching study on SARS-CoV nsp9. 
The tryptophan fluorescence quenching at the plateau (in percent) is plotted 
versus the length of ssDNA probes. The Kpapp values are extracted from the 
plot displayed in C and are discussed in the text. (C) The relative fluorescence 
of nsp9 at 340 nm is plotted as a function of the oligonucleotide concentration 
for ssDNA, ranging from 6- to 79-mer and for a 6, and 560-mer ssRNA. D, DNA; 
R, RNA. The Kpapp values (discussed in the text), result from the fitting of the 
data to a single exponential (GraphPad). 


wavelength was unchanged, indicating (7) that the interaction is 
nonspecific and may involve the sugar-phosphate backbone 
rather than the bases, and (i) that the environment of Trp-53 
remains polar. The apparent affinity does not depend on the 
ssDNA length, because Kpapp values fall between 0.63 and 1.1 
uM when data are fitted to a single exponential (Fig. 3C). The 
strongest quenching (90%) and the best Kpapp (0.4 wM) are 
observed with the 560-mer ssRNA, suggesting that each ssRNA 
binds several nsp9 dimers and that each nsp9 dimer can bind two 
distinct single-stranded segments. 

The binding of both ssDNA and ssRNA of unrelated defined 
sequences, together with Kpapp values in the micromolar range, 
suggests that the nucleic acid-binding activity of nsp9 is not 
sequence-specific. Much like the human CoV 229E helicase, 
which has RNA and DNA duplex-unwinding activities (35), nsp9 
is able to bind ssDNA or ssRNA equally, although binding of the 
latter is expected to be the native function. In the infected cell, 
the coupling/compartimentation of the viral RNA synthesis with 
the RNA-binding function of nsp9 might render RNA versus 
DNA specificity unnecessary. The wrapping of ssRNA around 
the nsp9 dimer is an interesting possibility that is compatible with 
the structural characteristics of the nsp9 dimer described here. 
An ssRNA binding-function of nsp9 is also consistent with its 
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natural abundance in the replication complex. Due to a ribo- 
somal frameshifting mechanism (36), nsp9 and other ORF1a- 
encoded CoV replicase subunits are produced in 3- to 5-fold 
excess relative to the “core” replicative enzymes [such as the 
RNA-dependent RNA polymerase and helicase (9) produced 
from replicase ORF1b]. For example, nsp9 might stabilize 
nascent nucleic acid during replication or transcription, thus 
providing protection from nucleases. The amount of nsp9 may 
not be enough to cover the entire ssRNA genome. The latter may 
not be entirely single-stranded, however, due to secondary RNA 
structure. Whether the ssRNA-binding function of nsp9 may be 
restricted to specific segments of the genome, or be comple- 
mented with other proteins is still an open question. 

The complexity of the RNA synthesis machinery of CoVs has 
long been predicted, considering the size of the ppla- and 
pplab-replicative polyproteins and the number of cleavage 
products produced from these precursors. Recently, Snijder et al. 
(8) described a set of putative RNA processing enzymes in the 
replicase complex of CoVs, including SARS-CoV. In addition to 
mere RNA replication, nsp9 could also participate in such a 
base-pairing-driven process as RNA processing. An informative 
parallel in the virus world is observed with bacteriophage T7: its 
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gene 2.5 ssDNA-binding protein binds substrates with similar 
affinity as SARS-CoV nsp9 does [kg in the uM range (37)] and 
is involved in replication/recombination/homologous _base- 
pairing events (38, 39). The structural and functional character- 
ization of nsp9 may also be relevant to SARS-CoV control: the 
SARS epidemics as well as previous work on CoVs have shown 
that genome plasticity (evolution by mutation and recombina- 
tion) relate to pathogenicity and probably also to drug resis- 
tance. Because many viral and cellular single-stranded nucleic 
acid-binding proteins are essential (40), nsp9 is to be added to the 
list of potential targets for anti-CoV drug design. 
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